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Scant attention has been paid to measurement error in frog morpho- 
metric studies, We study both interobserver effects of measurement on the 
same specimens of Vanzolinius discodactylus (Anura, Leptodactylidae) 
and intraobserver effect of repeated measurements on a single V. discodac- 
tylus specimen. Interobserver measurements differ statistically and result 
in different biological interpretations in some cases. Evidence is provided 
that log transformation of raw data is often unnecessary. Allometric trans- 
formation of measurement variables to remove size effect requires parallel 
regression slopes of variable against size. This requirement is not met with 
the V. discodactylus data, nor is it likely to be met when several variables 
are used in a morphometric study. We recommend: assume measurement 
differences between sexes in frogs and analyze data separately by sex; 
consider and select the most appropriate statistical model options for data 
analyses; avoid pseudoprecise measurements; do not rush to logarithmic 
transformation; remeasure at least one individual frog 20 times to provide 
an assessment of measurement error in data interpretation; be conservative 
in drawing biological inferences from morphometric analyses, basing inter- 
pretations and conclusions only on very robust effect size estimates and 
differences. 


INTRODUCTION 


Frogs are relatively soft-bodied organisms and their preservation requires considerable 
care Limbs and body must be correctly positioned to achieve standardized preparation 
Unfortunately. dierent preservatives and d.fferent individual techniques result in very diffe- 
rent museum preparations for the same species (fig 1) Therefore. precise, comparable 
measurements of preserved frogs are difficult For example, one of the standard measure- 
ments taken on frogs, snout-vent length (SVL), 1s somewhat problematic in larger preserved 
frogs, because the sacral-urosty le portion of the body usually s fixed at an obtuse angle to the 
vertebral column How much one “straightens out" the preserved animal has an effect on the 
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Fig | Thoropa mntarts (USNM 38936 on left, USNM 229848 on night) showing preservation 
positioning differences that make accurate, comparable measurements d. iet li 


resultant measurement. In spite of (or, perhaps oblivious to) these difficultes, researchers 
have used frog measurement data to address a variety of scientific questions, There has been 
little attention paid to precision and repeatability of frog measurement data and how this 
variation might affect the scientific questions being addressed 


We know of only one study (Lri. 1982) that demonstrated important measurement 
differences between fresh and preserved frogs and differences in measurements taken on the 
same individuals at the same state of preservation In that study, Lee took all the measure- 
ments himself using the same measuring equipment and methodology throughout Although 
Lit (1982) presented extensive literature on the effects of preservation technique on fish 
morphology and discussed us relevance to frog morphometrics, herpetologists have generally 
ignored his warnings. 
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We are not aware of any published studies of the effect of different mdividual researchers 
taking the same set of measurements on the same frogs to measure inter observer variability 
(although A Dubois and A. Ohler have unpublished data on this topic, personal communi- 
cation) Studies on other groups of organisms demonstrate that such differences are not 
trivial. Lee (1990) found differences in precision between two observers on scale count data 
taken from the same lizards. YEZERINAC et al. (1992) found that measurement error varied 
considerably, depending on the variable, for bird skeleton measurement data In these studies, 
a constant value was being measured That is, the number of scales did not change on any 
individual lizard, nor did the individual bird bones change size or shape. As indicated above, 
this 1s not true for whole frog specimens’ how the specimen is positioned will determine what 
the value of the measurement will be for several of the measurements (variables) commonly 
taken for frog morphometric studies. 


PAGANO & JoLy (1999) compared a select group of morphological measures on water 
frogs with an analysis of allozymic markers. These authors concluded that frog morphology 
was of limited use for their identification purposes They determined frog body landmarks for 
measurement points from digitized photographs of specimens. Data were input and analy zed 
оп a computer Similar methodology has proved acceptable for characterization of strati- 
graphic sections (see e.g, Benson et al., 1995), in which the surfaces are approximately 
linear and two-dimensional. However, for examination of three-dimensional, soft-bodied 
organisms, the use of such methods further complicates the measurement process, Despite the 
stated advantage of magnification of digitized figures for measurement purposes. statistical 
error minimization has not been proved to be achevable for measurements taken from frog 
photographs. Based on our experience, we do not recommend using photographs of frogs 
from which to take morphometric data. 


One of us (CG) took a series of measurements on specimens of the frog species 
Vanzolmius discodact lus (Anura, Leptodactylidae) from the Rio Juruá in Brazil to test the 
riverine hypothesis of speciation (Gascon et. al., 1996) Another of us (WRH) used the same 
specimens in a study examining differentiation throughout the entire species range of V 
discodac tylus (Hi ver, 1997) WRH took the same set of measurements on the same frogs that 
CG measured The two data sets were given to LCH to analyze and evaluate. During the 
course of this study, LCH reevaluated the statistical procedures and assumptions used in the 
Gascon et al. (1996) study 

The objectives of this study are: (1) to evaluate inter- and intra-observer statistical 
differences of measurement sets, (2) to understand the kinds of differences mvestigators create 
when measurmg frogs, (3) to evaluate the effect of measurement differences on certain 
statistical procedures that are generally applied in frog morphometric studies, and (4) to judge 
whether measurement differences yield different biological interpretations. 


METHODS AND MATERIALS 


Fourteen measurements were made on each frog. following the methodology in Gascon 
et al (1996). The fourteen variables are. snout-vent length (SVL), nostril separation, eye 
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width anterior, eye width posterior, head width, head length, eye to nostril distance, tympa- 
num diameter (tympanum height of Gascon et al, 1996), eye length, thigh length (femur 
length of Gascon et al. 1996}, shank length (tibia length of Gascon et al., 1996), foot length, 
maximum width of disk on third finger. and maximum wadth of disk on fourth toe. 


Prior to WRH's taking of these data, he confirmed landmarks with CG for a subset of 
the variables in an attempt to make certain that the measurements would be comparable. 


CG and WRH measured each individual one time. 


CG used digital calipers linked to an IBM-PC, measurements were made to the closest 
0.01 mm and the data were recorded with three decimal places. WRH used Helios dial 
calipers, measurements were made to the closest 0 1 mm and the data were recorded with one 
decimal place. 


To assess individual measurement error, WRH measured one male, USNM 348976, 20 
times over a 12 day period The eye region on one side of the head is slightly squashed, other- 
wise this specimen is in reasonable shape. The specimen is about average in overall state of pres- 
ervation and positioning in terms of ease of measurements. Measurements were taken at 
various times of the day and measurements were never taken one immediately after the other to 
eliminate or minimize carry over effects of learning or memory For SVL, efforts were made to 
focus visually on the caliper jaws when measuring the specimen and not to look at the readout 
dial until after the jaws had been set. All other measurements were taken under a dissecting 
microscope with the calipers while the measurement readout dial was not visible in the field of 
observation. Measurements were recorded on dated and timed separate, individual data sheets. 


CG and WRH used d.fferent criteria to categorize sex of the individuals, CG used three 
categories Е. M and o In cases where CG opened the frog to take ussues, sex and whether the 
individual was adult or not were determined by the state of its gonads. Individuals recorded 
as 0 were not opened These data were recorded under field conditions. For the morphological 
analyses reported by Gascon et al (1996), data for adult and non-adult males were combined 
as were the data for adult and non-adult females WRH used five categories: M, F, B. С and 
J The M (adult male) category was determined by presence of vocal slits in males. The F 
(adult female) category was determined by presence of developed ova or some curliness of the 
oviduct in females. The B (juvenile male) category was determined by presence of testes. The 
G (juvenile female) category was determined by presence of ovaries. The J (juvenile) category 
was used when sex could not be determined, either because the gonads were indeterminate in 
very small specimens or the gonads had been removed from the specimens when tissues had 
been taken These data were taken in the Jaboratory with the aid of a Wild stereoscopie 
dissecting microscope. 


Male and female immature gonads of Рапгойтиз discodacrylus ате quite similar in 
appearance and difficult to differentiate without detailed examination under magnification. 
Both ovaries and testes have a mosaic like pattern externally The only consistent difference 
between immature gonads is that the testes have a smooth external surface. whereas ovaries 
have an irregular external surface, Not surprisingly, the difficulty of differentiating gonads 
using the unaided eye resulted in several different interpretations of sex by CG and WRH, The 
dilferences are (CG determination, followed by WRH determination) INPA 2410 (F. B). 
INPA 2371. 2433, 3397. 5605. 5671. 5728, 5735, 5736. 5799, 5801 (M. G). INPA 3572, 5571 (F. 
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J, gonads now removed in both), INPA 3177, 3573, 5524, 5592, 5670, 5697, 5730 (M. J, gonads 
now removed in ail). 


WRH'scategonies of adult male (M) and adult female (Е) are used in the analysis section 
for both the CG and WRH measurement data sets unless otherwise noted Using this 
categorization, 88 adult mdividuals are available for analysis Each variable was examined and 
summarized separately for male and female adults. Graphs and descriptive statistics were 
calculated and assumptions tested prior to means tests or predictive analyses. Loganthmic 
transformations were performed and descriptive statisucs calculated on the transformed 
values as well. Tests of normality were performed and discussed below 


In this study, we cannot calculate residual measurement error because we do not have the 
“true” value of the variable for any individual specimen. Similarly, we are unable to assess a 
statistical variabihty estimate for the factors involved in the overall measuring error. That is, 
we cannot remove intra- observer variability from inter- observer measurement error. We 
therefore evaluate the two factors separately. 


We distinguish “precision” from "accuracy". Accuracy is the closeness of an observer's 
measurement to the quantity intended to be measured. In our case, this is unknown for the 
true value of the frog's morphological measurement but can be evaluated by considering the 
closeness of the results of the two observer's values. Precision refers to the entire class of 
measurements and how well repeated measurements sclf-conform. In this case, the mean 
value does not have to be the “true” value of the variable. To examine these characteristics we 
calculated both inter- and intra-observer variability estimates and also descriptive measures 
for qualitative evaluation of the frog data. 


Data were analysed either using direct mathematical formulae or using the software 
package SPSS 8 0 (ANoNYvwous, 1998} Although the discriminant function analyses were 
done using SPSS 8.0 (ANONYMOUS, 1998), the figures were produced using either SYSTAT 
versions 7 (ANONYMOUS, 1997, for fig 7) or 9 (ANONYMOUS, 1999, for fig. 5-6) 


'THF APPROPRIATENESS OF RAW DATA TRANSFORMATION PROCEDURES 
IN FROG MORPHOMETRIC STUDIES 


Gascon et al (1996) used an allometric transformation procedure described by THORPE 
(1976) m an effort to remove size effects from the data. The Thorpe procedure (presented in 
detail in THORPE, 1975) involves two steps: (1) log-transforming the origmal measurement 
data; and (2) transforming the log values using a common slope based on the entire data set. 
The topic of transforming raw data is discussed first, followed by demonstration that the 
statistical assumptions of the Thorpe procedure are not met by the Vanzofmius data as used by 
Gascon et al. (1996). 

Although not specifically mentioned by Gascon et al (1996), the raw measurement data 
were log-transformed as part of THORPE's (1976) transformation procedure Raw data are 
transformed as a matter of course in many multivariate analyses of frog morphometric data 
(for a recent example see Grein et al, 1997) SOKAL & RoHLE (1969) state that log transfor- 
mation 1s the most common transformation for biological data and they provide a cogent 
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35 


Eye Length 


Fig 2 Histogram of eye length values measured by CG on total sample of 131 frogs with normal 
distribution best fit 


discussion on the topic of log-transformmg variables as a way to meet some statistical test 
assumptions that are not met by raw variable data. However, this transformation 15 often 
applied routinely, when, in fact, it may be either unnecessary or incorrect to do so. 


Replacing each measurement by its logarithm may result in more approximate variance 
equality Also, for many biological applications the data can be normalized by this change 
The assumption of concern for our purposes is whether the variables are normally distribut- 
ed. Using BESTFIT (AsoNYMoLs. 1995) on the data as analyzed by Gascon ct al (1996), 
untransformed variables for the entire sample size of 131 individuals were fit with a normal 
distribution (see fig. 2 for an example) We used the Anderson-Darling test criterion as well as 
a chi-square test of fit The Anderson-Darling criterion is more tail-sensitive than the 
ordinary chi-square goodness-of-fit test. 


Sokar & Кони» (1969) state that the log transformation may be appropriate and useful 
when the means of the samples are proportional to the range or standard deviation of the 
respective samples. The biological questions we are asking of the Vanzolini data require 
grouping of the data by locality. None of the variables, for the total sample or when organized 
by locality, show a relationship of mean with either standard deviation (ғ = 0.06 %% or range 
(r = 0.19 лы, In addition, each raw variable plot shows approximate symmetry, lack of 
prominent skewness and unimodality (for example, snout-vent length as shown in fig 3) 


Thus, the data as analyzed by Gascon et al (1996) can be appropriately analyzed as raw 
variable measurements, rather than log transformed variables. lt is not mcorrect statistically 
to apply and use the logarithmic sample data for this problem. It is, however, unnecessary for 
the morphological variables being measured here. 


The reason Gascos et al (1996) used logarithmic transformation was 10 attack the 
problem of allometry effects in their data, which meluded both adults and juveniles. THORPI 
(1976) presented a procedure that uses а log transformation as an mittal step toward 
eliminating the influence of allometry We examined the application of this approach and 
found it inappropriate for the Fanofmito data for the following reason 
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SVL - Untransformed Data SVL - Log Transformed Data 
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F.g 3 - Histogram of SVL values measured by WRH x tn a normal distribution curve superimposed 
on both raw and log transformed data for 131 specimens 


Following Tuorpi. (1975), GAscon et. al (1996) used the allometric transformation for 
all variables to “remove size effects for the data’. Raw measurements were adjusted using a 
common slope for all locahty data sets and sexes combined. This procedure adjusts the 


nst size, When there 


allometric character or variable by using the slope of its regr 
ате multiple localities, as in the case of the present work, the pooled within-locality slope 1s 
used to make the adjustment. 


This procedure can be apphed appropriately only when the to re approxi- 
mately parallel That is, when a test of slope homogeneity (the first step in most packaged 
ANCOVA programs) shows no significance, the slopes from the separate locahties can be 
pooled For the 11 localities of this study, that is not the case. When there is heterogeneity, one 
can do the calculations to obtain a common within-locality slope, but the resultant number is 
meaningless. When the slope test indicates heterogeneity, as is the case for this data set (P — 
0 001), there can be no one slope to describe the data (fig 4). Therefore, the problem of size 
effects in the Gascon et al (1996) data would remain 


We can eliminate the need to consider allometry by using only adults but we still need to 
consider sexual size effects. If size cffects are not present or if they can be removed statistically. 
then male and female specimens can be pooled for analyses that can be more statistically 
powerful As stated by Gascon et al (1996), the transformation manipulations they (nap 
propriately) applied did remove size effects between males and females (which included both 
mmature and mature individuals) for all variables except head length They deleted this 
variable from their analyses and combined male and female data in their analyses. In our 
analyses. we examine the sex differences on both raw and transformed variables using adults 
only 


When the raw variables are examined using CG's classificat.on of males and females (124 
total) and his measurements (I) all fourteen variables have non significant un arate homo 
аспону of variance tests (an assumption lor means tests). (2) all univariate F tests (Е, eon 
means are significant (P — 0 000), (3) the multivariate F, „ s sigmficant (Р — 0 000, 
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Variable Values - Log Transformed 


0 1 2 3 4 
SVL - Log Transformed Values 


Fig 4 Fourteen regression slopes for morphometric variables. log transformed, CG measurements, 
for 131 specimens per variable. 


Hotellings T? = 1.075); and, (4) homogeneity of slope is rejected (Р = 0 000) No regression 
effect could be determined and removed. Similar results hold for the logarithmic-transformed 
data. Thus, it 1s inappropriate to assume that we can combine Gascon's raw male and female 
data in univariate or multivariate analyses. We know of no valid procedure to remove the 
sexual size differences under the conditions mvolved with this data set. 


When WRH's raw data of 88 (57 female: 31 male) known adults are used' (1) all 14 
variables have non-significant univariate homogeneity of variance test results; (2) all univa- 
rate F} g -tests (s-tests, df 86) are significant (Р — 0 000); and, (3) multivariate F „ g4 18 
significant (P = 0.000). Similar results held for the log-transformed variables. In practice then, 
because of equivalent results with this sample data, either the log-transformed ог the raw data 
could be used for further testing. However, it is an unnecessary complication for both 
application and interpretation to transform a variable when the raw data can be used. We 
continue with the raw data results for the 88 adult specimens, for which males and females test 
significantly different on each of the measurements considered. 
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ANALYSIS OF MEASUREMENT DATA 


EFFECT OF ROUNDING 


There are two components to consider when rounding a raw measurement value that 
could impact amphibian data sets: (1) pseudo-precision, and (2) the number of decimal places 
used by computers in calculating statistical algorithms. 


Pseudo-precision is using greater precision in calculations for measurements than can be 
Justified in terms of the origmally recorded accuracy of those measurements. For example, if 
multiple measures of the tympanum diameter of the same individual frog specimen are 2.165, 
2 224.2 187, 2.240, 2 193, the tympanum cannot be measured accurately beyond one decimal 
place. Using values with two or three decimal places for these values is pseudo-precision. 
Statisticians advise using precise measurements only (e.g , SOKAL & ROH. 1969. 13-16) 
Biological practitioners routinely ignore this advice. For example, although WRH uses 
mechanical dial calipers that record measurements to the nearest tenth of a millimeter, in the 
size range of Vanzoluuus discodactylus, snout-vent length can be measured only to a precision 
of 0 6 (see tab. 4). Thus, this variable should be recorded to the whole number, not with one 
decimal place. 

A second potential biological consequence results from the number of decimal places 
computers use in calculations. This 1s less of a problem now with recent computer advances 
in calculation, However, using pseudo-precise measurement data certainly can result in 
different numerical values for test statistics, which are summarizations. To test whether any 
biologically meaningful mterpretauons would be drawn from our data due solely to rounding 
errors, paired t-tests were computed on two sets of the data We compared the CG and WRH 
measurements as recorded (WRH with one decimal place, CG with three) with both data sets 
recorded to one decimal place. The CG data set was rounded by the usual method of rounding 
up the i" place when the (1+i)" place is 5 or more 

As expected, when different numbers of decimal places are used (rounded vs. not) for the 
data set, several of the resultant test values vary slightly However, in no case are the decisions 
different for the selected test level (0.05, 0 01, 0 001), nor would any different biological 
inferences likely be drawn from the observed probability levels (tab. 1) from corresponding 
tests. 

While pseudo-preeision, as a consequence of computer generated or digital caliper 
induced values, 1s biologically and statistically offensive, tt does not impact seriously the 
univariate descripuve or inferential results of real data sets such as ours for Vanzofmins 
discodact) hus 


INTER-OBSERVER DIFFERFNCFS 


A battery of descriptive statistics was run on the raw measures of WRH-defined adults to 
evaluate the nature of differences between the CG and WRH measurements (tab 2) The 
mean for each observer was calculated for each measurement. The usual assumption for à 
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Table 1 


reported at statistical. mappropriate 4 decimal place leve. to demonstrate effect of computation results 


Comparison us ng CG measurements at three decimal places and rounded off to one decimal place to WRH measurements at onc decimal place. Mean values 8 


N usur Means Coefficient of vanation. T-stanstic T - sigmficance. 
0000 00 0000 6.000 00 0.000 00 
Snout vent length Ci) HL) S002 525 -285 -282 0.005 0005 
WRH 302260 
сб 2.7058 2 7053 687 
N . in 9! ° 
lostn. separation E EE 598 592 0 000 000 
со 58029 58000 
-767 7 
Eye width anterior TW UE 6 32 0000 0000 
Eye width postenor -979 979 0.000 0000 
Head width — -057 0.48 ns ns 


(Fe SI SALXIx 


Head length — 2 AE E Sti 1457 -1466 0.000 0000 
WRH — — 
cec 35326 
ы ” 3 ? П 
Eye-nostril d.stance WRH 321 336 0.002 0 001 
са 3.7019 
Е; th 870 8 80 0000 0000 
ye engi ns 
7 
"Tympanum diameter — 1943 1667 -1609 0.000 0000 
WRH 
сс 12 8507 
-399 E 
‘Thigh length ER 3 402 0.000 0000 
3533 
Shank length eg 855 1094 1081 0.000 0000 
WRH 
545 
Foot length. са 50 -1287 -1286 0000 0.000 
WRH 
Third finger disk width — pena -483 -439 0.000 0000 
WRH 
са 0 8215 
Fi th disk width -1944 -9 4i 0 0.000 
‘ourth toe disk widil WRH 0 8 000 
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Table2 -Descnptive statistical differences between CG and WRH measurement data on the same specimens of adult Vanzolinius discodactylus (n 88) 


Bartley Observed Р | cem 
раа а test E PQumb| 7 | Qa) | determin 
Snout-vent length Gu 103 -315 0002 100 0000 099 
WRH 
СС 
Мозп, separation 08! 524 0000 0.75 0000 057 
WRH 
c 
Lye width antenor 099 -647 0000 087 0000 076 
WRH 
Eye width postenor сб 097 -730 0000 0.92 0000 084 
WRH 
Со 
Head vid oss | ов | oso | oss | 00% | ооо 
WRH 
CG 
Head length 094 | -1242 | 0000 | oss | oo | oso 
WRH — 
De-mwnlésme EC 102 | -366 | 000 | oss | ооо | ол? 
со 
7 
Eye lengih XA 050 120 | 0000 | 073 | 000 | 04 
‘Tympanum diameter S 120 | -1412 0000 089 | 000 | 079 
Thigh length 58 tor | 314 | оо? | ооз | 000 | озо 
— WRH 
Shank ,ength. 5 — 105 874 0.000 099 0.000 098 
— WRH 
сс 
Foot length 105 | -1186 | 000 | 0% | оою | 095 
WRH 
Third finger disk width s 079 283 | 0006 073 0000 053 
а сс 
Fourth toc disk width tos | -au | 000 | 07 | оою | оз 
WRH 
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Table 3 — Performance rankings of measurement vanables 


"EET ET | BREET RR 
variation. 

Snout-vent length Good Best Good Good 
Nostril separation Moderate Moderate Good Moderate 
Eye width anterior Moderate Worst Good Good 
Eye width posterior Moderate Moderate Moderate Good 
Head width Good Best Moderate Moderate 
Head lengih Poot Moderate Poor Moderate 
Eye-nostnl distance Moderate Moderate God | Good 
Eye length Poor Worst Poor Poor 
Tympanum diameter Poor Best Good Moderate 
Thigh lengih Moderate Best Good Good 
Shank length Moderate Moderate Good Good 
Foot length Moderate Best Good Good 
Third finger disk width Moderate | Best Poor Moderate 
Fourth toe disk width Por | Best | Moderate Good 


Student's /-test are not met because the observers measured the same sample and not samples 
independently chosen at random. Because these are repeated measurements, the test statistic. 
denominator we use to test for a difference between each observer pair of measures 1 the 
formula for the standard error of a difference when samples are not independent That 
formula 15 Swi mea 7 Su T SOLUS ay e 25,04, Sey F. Al. Where mii). 17 1,2, are the two 
observers means for the particular measurement, s 1 the standard deviation for each, and ris 
the correlation between the two sets of paired measures. Alternatively, for n pairs, the 
standard deviation of the d differences can be written, s, = sqrt Xd? dH and the 
test statistic is / = Xd/s,. 


The paired r-test results indicate that all variables ditler significantly except for one (head 
width} The correlation coetlicients are all statistically significant and most coefficients of 
determination are high. The correlation statistics, considered with corresponding coefficient 
of variation values, indicate that the two sets of observer measurements are consistent and 
generally comparable. The f-test results leave no doubt, however, that overall, our two sets of 
measurements differ statistically. 

Given that our measurements are staustically diferent, we wish to explore our measure- 


ment performance on a variable by variable basis To do this, various ways of describing 
performance are ranked and compared 


(1) Mean inter- observer difference of measures adjusted by magnitude of variable The 
intent of this comparison is io evaluate how well the two sets of measurements agree with each 
other, specifically to see if the observers performed better on larger measurements than 
smaller (e g . snout-vent length (SVL) «s. width of third finger disk) The smaller mean value 
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for the same individuals and for each variable was subtracted from the larger That number 
was divided by the average value of the two means. The resultant values range from 0 002 to 
0 126. For comparative ranking purposes, good is considered to be 0.000-0.005, moderate 
0 005-0.050, and poor 0.050-0.150 (tab. 3) 


(2) Coefficients of variation Values of the coefficient of variation (CV) for each measure 
are often used to compare the variability of the variables. Adjustments for sample size and 
other factors have been suggested (e g. DELAUGERRE & Dusors, 1985). We chose to use the 
original formula and to categorize the CV values because, regardless of adjustment, the CV 
remains extremely sensitive to errors in sample means. For evaluation and ranking purposes, 
the best category, 5 0-6 0, has the lowest variability in the attribute measured; moderate is 
6.0-7.0, and the worst category 1s 7.0-8.0 (tab. 3). Most of the coefficient of variation values 
for each observer pair fall into the same categories (see tab. 2); m the few cases where our 
values fell in different categories, the average of our values was used for category placement 


(3) Difference in coefficients of variation. The intent of this comparison is to evaluate 
repeatability of our measurements. If each of us has the same degree of measurement 
repeatability, our coefficient of variation values should be identical. Therefore, how different 
these values are indicates degree of deviation from consistency of measurement for the 
variable mvolved For ranking purposes, good is a difference of 0.0-0.2, moderate 1s 0 3-0 5; 
and poor is 0.6-1 5 (tab. 3). 


(4) Hartley F-max test. The Hartley test statistic, which i5 the quouent of the larger and 
the smaller variance, provides another way to evaluate repeatability of measurements. A 
Hartley test value of 1 15 not significant, values both larger and smaller indicate differences, 
For ranking purposes, good is 0.9-1.1, moderate 0.8-0.9 or 1.1-1.2, and poor < 0 8 (tab. 3). 


From the above (tab. 3), it ıs apparent that CG and WRH measured one variable 
consistently and with the greatest precision, snout-vent length There are five variables that we 
measured with reasonable consistency and precision: head width, eye-nostril distance, thigh 
length, shank length, and foot length There are four variables that we apparently measured 
differently, but each of us with reasonable to good precision. head length, tympanum 
diameter, foot length, and width of fourth toe disk Apparently we are us.ng slightly different 
landmarks for these measurements. For the tympanum, it would seem (hat CG's description 
of tympanum height (Gascon et al . 1996) does in fact describe something different from 
УАН definition of tympanum diameter Once these results became known. CG confirmed 
that he always measured the vertical distance of the tympanum relative to head position and 
WRH took the measurement at the point of greatest tympanum diameter, irrespective of 
position of the tympanum relative to the head For the width of the fourth toe. we obviously 
used different criteria of how much contact of the disk with the calipers was used. The most 
inconsistent measurement is eye length. That is we measure the variable differently as well as 
imprccisely, This suggests that this variable should not be used for morphometric analyses in 
Vanzolinius discadact ius We further suggest that because this variable 1s affected by preser- 
vation artifact to a great degree. it should probably not be included 1n any frog morphometric 
study. 

There is one result we find surprising Overall. we measured larger variables {such as 
snout-vent length) equally as well (or poorly, depending on perspective) as smaller variables 
(such as third finger disk width). 
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Table 4 Deseriptwe statstcs for 20 repeated measurements of a single specimen of Vanzohintus 


discodactylus 
Vanable | Mimmum | Maximum | Mean | Standard | Coefficient 
Snout-vent length | 261 26.7 264 0.16 001 
Nostril separation | 28 22 0.09 004 
Eye width anterior | si 54 0.12 0.02 
Eye width posterior | 74 71 0.17 0.02 
Head width 90 92 0.10 0.01 
Head length 10.0 10.3 0.16 0.02 
Eye-nostril distance 0.09 0.03 
Eye length | 0.19 0.06 


0.06 0.03 


Tympanum diameter 
Thigh length 
Shank length 
Foot length 
Third finger disk width 
Fourth toe disk width 


INTRA-OBSERVER DIFFERENCES 


Standard descriptive statistics for the twenty repeated measurements on each morpho- 
logical variable of the single specimen (tab. 4) generally mirror inter-observer variation That 
is, SVL, which CG and WRH measured with greatest precision, bas a low intra-observer 
coefficient of variation. Eye length, which was the most imprecise inter-observer vartable, has 
the highest intra-observer coefficient of variation. 

Given the thousands of frogs that WRH has measured, one would predict that there 
would not be a change (improvement) in measurement accuracy from the first re-measure to 
the twentieth. Two sample 7-tests of measurements 1-10 against measurements 11-20 were not 
statistically significant, except for posterior eye width. Given that the eye that was measured 
was misshapen with preservation, it is likely that the landmarks used by WRH changed over 
the re-measurement process. 


BIOLOGICAL INTERPRETATIONS OF MEASUREMENT DIFFERENCES 


INTER-OBSFRVER DIFFERI NCIS 


Our inter-observer differences over sets of measurements are unarguably statistically 
ditferent at highly significant levels, yet it does not necessarily follow that such inter- Observer 
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measurement differences lead to different biological conclusions for the same set of speci- 
mens. For example, it seems likely that some of our measurement differences are due to 
consistent differences in the way we took the measurements. Given a large enough sample, 
such differences would be statistically significantly different. However, because the measure- 
ments would have been taken consistently by each observer, the variation described in the two 
sets of measurements would be equivalent, and, hence, lead to similar conclusions for any 
biological inferences drawn from the data We test this idea using our measurement data in 
two analyses aimed at obtaming msight into biological processes through analyses of mor- 
phometric data. 


Geographic variation 


Multivariate discriminant function analyses are often used to analyze patterns of geo- 
graphic variation in study organisms. For our purposes, we grouped the specimens from 
Gascon et al.’s (1996 377, fig. 1) eleven numbered localities into four major groups, separated 
Imearly along the Rio Juruá. Our Area 1 15 Gascon et al `$ (1996) locality 1 (н 7), Area 215 
locahties 2+3+8+9 (n — 20), Area 3 localities 4+5+10+1) (n = 50) and Area 4 localities 6+7 
(n 7 11). We use only WRH-defined adult specimen raw data (n — 88) in the analyses. 


As described previously, the data for males and females are significantly different (Р < 
0001). The values for each of the variables are assumed to have a multivariate normal 
distribution with equal variance-covariance matrices (VCV) within the 4 areas To decide 
whether to combine the sexes, locality tests should be performed. However, all tests of VCV 
equality are highly sensitive to normality. In addition, there 1s no practical, effective test for 
multivariate normality for our smaller-sized samples. We can hypothesize that since the sexes 
are highly significantly different over the entire sample then they should be different in and 
over each area. Alternatively, we might not. 


Let us use the untransformed measurements to examine the results of a discriminant 
analysis by sex. We use WRH's designations of adult males and females and compare final 
results when using each observer's measurements. 


For the female data (Area 1° n 5. Area 2 п = 12, Area 3 n= 34. Area 4: n — 6, total N 
7 57), the discriminant analysis results for each of the observer's data sets are far from 
identical (tab. 5, fig. 5). Of particular interest is that, in the stepwise procedure. the variable 
entered in the first step (that which explains the greatest amount of unconditional univariate 
variance among area samples) differed. as did the variables used in the final model Since the 
variable impact differed between the two data sets in the discrimmant model, it is not 
surprising that there were differences in the values for the canouical functions, first axis 
variable loading, and postertor classifications (tab. 5) 

Male data (Area 1: л — 2. Area 2 n - 8. Area 3 n — l6: Area 4 ½ 7 5, total N= 31) results 
are similar (tab. 6, fig 6) to the female results 10 the kinds of discrepancies that measurement 
differences caused in the discriminant function analyses for the two sets of measurement data 


Would the d. ſferent results from these analyses result in different biological interpreta- 
tions? One of the main methods for evaluating such geographic variation analyses is the plot 
of the first two canonical axes. The diseriminant function program m SYSTAT 9 (ANONY- 
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geographic regions, with two sets of measurements taken on the same individuals. 


Comparison of discriminant function analysis results for female data of Vanzolimus discodactylus by 


CG measurements 


WRH measurements 


Signsficant univariate F-test 


Sigmficant univariate F-test 


SVL SVL 

Head length Head length 

Head width Head width 

Nostnl separation Nostril separation 
Eye-nostril distance Eye-nostri] distance 
Eye width anterior Eye width anterior 
Eye width posterior Eye width posterior. 
Tympanum diameter Tympanum diameter 
Thigh length Thigh length 

Shank length Shank length 

Foot Jength Foot length 


Stepwise discriminant model 
First variable tried 


‘Stepwise discriminant model 
Furst variable tried 
Tympanum diameter 
Fmal model uses 
Postenor eye width 
Shank length 
Tympanum diameter 
Final model cannot separate 
Group 4 from Group 1 


Thigh length 
Final model uses 
Posterior eye width 
Head width 
Thigh length 
Foot length 
Eye lengtn 
‘Third finger disk width 
All groups separable at 0.001 level 
in final model 
Canonical discriminant function 
Е * Eigenvalue % variation 
1 1.2801 0.55 
2 08429 036 
3 02028 009 


* 
0.000 
0.000 
0 052 ns 


Canonical discriminant function 


F.9 Eigenvalue % variation 
1 11674 0.77 
2 02750 0.18 
3 00718 004 


* 
0000 
0 002 
0 056 ns 


First axis explanation 
Thigh length (0 94) 
Head width (- 0 66) 
Foot length (0 63) 
Third finger disk width (- O 52) 


First axis explanation 
Tympsnum diameter (0 93) 
Posterior eye width (- .058) 
Shank length (0 51) 


Overall classification 


Group % 
1 100 
с 88 
3 69 
4 30 
Overall 77.4 


Overall classification 


Group % 
1 80 
2 83 
3 83 
4 67 
Overall 80.7 
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Female Data 
CG Measurements WRH Measurements 
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First Canonical Variable First Canonical Variable 


Fig 5. Doscriminant function analysis results for female Vanzolinues discodacti lus by geographic Areas 
1-4 (see text for definition of areas) 


Male Data 

CG Measurements WRH Measurements 
2 3 
= T — — 
S 2 
> 2 
© 11. — 
© 
: 
S d 
Ó -1 
Е -2 
8 

i» — 1 1 1 1 -3a 

3 -6-5-4-3-2-1 012 3 4 

First Canonical Variable First Canonical Variable 


Fig, 6 - Disenmant function analysts results for male Vanzofuuus discodactylus, by geographic Areas 
1-4 (see text for definition of areas) 
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Table 6. - Comparison of discriminant function analysis results for male data of Vanzolimus discodactylus by 
geographie regions, with two sets of measurements taken on the same ind.viduals. 


CG measurements 


WRH measurements 


Sigmficant univariate F-test 


SVL SVL 
Head width Nostnl separation 
‘Tympanum diameter Eye-nosttil distance 
Thigh Eye length 
Shank Tympanum diameter 
Foot Thigh 

Shank 

Foot 


Sigmficant univariate F-test 


Third finger disk width 
Fourth toe disk width 


Stepwise diserinunant model Stepwise discriminant model 

First variable tried. First variable tried. 
Tympanum diameter Tympanum diameter 

Final model uses Final model uses 
Head width Head length 
Eye width postenor Eye length 
Tympanum diameter Eye width postenor 
Shank Shank 


Final mode] cannot separate 
Arca | from Arca 3 
Area | from Area 4 


Final model cannot separate. 


Area 2 from Area 3 
Area 3 from Arca 4 


Canonical discriminant function 


Canonical discriminant function 


F. % Eigenvalue %уапайоп y! F# Eigenvalue 9% уапапоп ~ x? 
1 11000 049 0000 1 4.4500 089 0000 
2 0.8908 040 0001 2 05020 0.10 0080 
3 0.2416 011 0 0600 ns 3 0 0272 0.01 0 708 ns 
First axis explanation. First axis explanation. 
Head width (- 1 97) Head length (- 2 99) 
Shank (1 23) Tympanum diameter (1 43) 
Third finger disk width (155) 
Eye length (0.91) 
Overall classification Overall classification 
Group % Group % 
1 100 1 100 
2 58 2 100 
3 71 3 69 
4 100 4 80 
Overall 737 Overall 807 
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mous, 1999) was used to produce fig. 5 and 6 The forward step option was used. The female 
data differ in the distinctiveness of specimens from Area 1 (fig. 5) and might or might not be 
given different biological interpretations by different researchers. For example, the CG results 
(fig. 5, left) could be interpreted as supporting a model of more-or-less linear differentiation 
along the river, whereas the WRH results (fig 5, right) could be interpreted as showing slight. 
differenuauon of samples without any geographic pattern evident. However, the very dif- 
ferent results in distinctiveness of male specimens from Area 2 (fig 6) would certainly be given 
different biological interpretations for the two data sets. 


Riverine hypothesis of differentiation 


Gascon et al. (1996) used multivariate analyses of morphometric data to determine 
whether there was a riverine effect on differentiation to compare with a data set derived from 
allozymic variation, We use our measurement data to address the same question, but in a 
slightly different way from the Gascon et al. (1996) approach 


There are two matched seis of localities immediately across the Rio Juruá available for 
comparison. Our Group 1 (localities 2+3 of Gascon et al , 1996) is immediately across the 
river from Group 2 (localities 8+9) and both are geographically separated from Group 3 
(localities 4+5), which 1s immediately across the river from Group 4 (localities 10-11) Groups 
and 3 are on the same riverbank, as are Groups 2 and 4. If the riverine hypothesis of 
differentiation were operational, we would predict that there should be less differentiation 
between Group 1 & 3 and 2 & 4 than between Group | & 2 and 3 & 4 


We used raw measurement data for adults and ran discrimmant function analyses 
separately on male and female adult WRH-defined specimens. Each observer data set was 
used separately and results compared. 


The sample size for males is 24 (Group 1 п 3. Group 2 n= 5; Group 3: n = 4, Group 
4 n= 12). for females it is 46 (Group 1. n = 10, Group 2: n > 2. Group 3. п — 6, Group 4 n — 
28). 

We also explore possible differences between discriminant function criteria The Wilks’ 
criterion finds axes that account for the greatest separation among groups. The Mahalanobis 
criterion finds axes that maximize pairwise separation of groups. In this case, the Mahalano- 
bis criterion model 15 more appropriate to test the riverine barrier hypothesis, as we are 
interested in pairwise differences among the four groups we are analyzing 


For the Mahalanobis pairwise separation criterion, the 0 05 probability level was used 
rather than the approximate F-level as the cutoff stepwise criterion for entrance and removal 
of variables. For the Wilks’ separation criterion, the approximate F. level for removal/entrance 
was used 

The results (tab. 7-8) indicate pronounced differences both due to measurement differ- 
ences and model differences. In no case is the set of variables used m the final model the same 
for the CG versus WRH measurement data As a consequence, none of the discriminant 
function results are the same when the CG data set analyses are compared with the WRH data 
set analyses. The differences due 10 model separation criterion d.fferences are of the same 
magnitude, however For example, only m the case of the CG male data did the Mahalanobis 
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Table 7. — Comparison of discriminant function analys.s results for male data of Vanzolimus discodactylus, 
testing the nverine hypothesis, using two sets of measurements taken on the same md.viduals. 


CG measurements WRH measurements 
Significant unvaniate F-test Significant umvanate F-test 
SVL SVL 
Head length Head length, 
Head width Nostri separation 
Eye-nostnl distance Eye-nostni distance 
Eye width antenor tye length 
Eye width posterior. Eye width anterior. 
Tympanum diameter. Eye width postenor 
Tuga Tympanum diameter 
Shank Thigh 
Foot ‘Shank 
Foot 
Third finger disk width 
Fourth toe disk width 
Mahalanobis entenon, 0.05 probability cutoff 
Stepwise discriminant mode. ‘Stepwise discriminant model 
First variable (лей Farst уапар,е med 
SVL Eye-nostni distance 
Final model uses Final model uses 
SVL SVL 
Thigh Eyc-nostrit distance 
Shank Tympanum diameter 
Fourth toe disk width Foot 
Final model separates all groups Third finger disk width 
Final model separates all groups 
‘Significant canonical axes ‘Significant canonical axes 
Three, exp.aumng 100 % of variance Three, explaining 100 9o of variance. 
Overall classification Overall classificanon 
Group % Group % 
1 100 1 100 
2 100 2 100 
3 75 3 100 
4 83 4 100 
Overa. 83 1 Overall 100 
Wilks’ criterion, approximate F-level cutoff. 
Stepwise discriminant model ‘Stepwise discnminant model 
Farst variable tned Forst vanable tried 
Tympanum diameter Tympanum diameter 
Final model uses Fanal model uses 
Tympanum diameter SVL 
Final model cannot separate Eye-nostci distance 
Group 1 from Group 2 Tympanum diameter 
Group 1 from Group 3 Foot 
Group 2 from Group 3 Third finger disk width 
Final model separates all groups 
Significant canonical axes Significant canonical axes 
One, explaming 100 % of variance. Three, explaimng 100 % of vanance 
Оуега,! cassification. Overall classification 
Group % Group % 
1 n 1 100 
2 80 2 100 
3 0 3 100 
4 83 4 100 
Overall 88 Overall 100 
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Table8 — Companson of discrimmant function analysis results for fema.e data of Vanzolimus discodactylus, 
testing the riverine hypothesis, using two sets of measurements on the same individuals 


CG measurements WRH measurements 

Significant univariate F-test Significant umvanate F-test 

SVL SVL 

Head length Head length 

Head width Head width 

Nostril separation Nostril separation 

Eye-nostnl distance Буе-позп, distance 

Eye width anterior Eye width antenor 

Eye width postenor Eye width ромепог 

Tympanum diameter Tympanum diameter 

Thigh Tingn 

Shank Shank 

Foot Foot 


Mahalanobis criterion, 0.05 probabi.ity cutoff 


Stepwise discriminant model 

First variab. e (пей 
Thigh 

Final model uses 
Tympanum diameter 
Thigh 
Th finger disk with 

Final model cannot separate 
Group 1 from Group 2 
Group 1 from Group 3 


Stepwise discnminant model 

Ели var.able tried 
Eyc-nostril distance 

Final model uses 
SVL 
Eye-nostril distance 
Tympanum diameter 
Shank 

Fmal model cannot separate 
Group 1 from Group 2 
Group 2 from Group 3 


Significant canonical axes 
Three, explaining 100 % of variance. 


S.gnificant canomcal axes 
Three, explaining 100 % of variance 


Overall classification 


Group % 
1 40 
2 100 
3 83 
4 82 
Overall 74 


Wilks’ cntenon, approximate F-level cutoff 


Stepwise discriminant model 

Furst variable tried 
Thigh 

Final model uses 
Tingh 

Fina, model cannot separate 
Group 1 from Group 2 
Group 1 from Group 3 
Group 2 from Group 3 


Overall classification. 
Group % 
1 60 
2 100 
3 83 
4 96 
Overall 87 
Stepwise discnmiant mode. 
First variable tried 
Tympanum diameter 
Final model uses 


Tympanum diameter 
Fmal model cannot separate 
Group 1 from Group 2 
Group 1 from Group 3 
Group 2 from Group 3 


Significant canonical axes 
One, exp.aining 100 % of variance. 


Significant canonical axes 
‘One, explaining 100 96 of variance 


Overall classification 


Group % 
1 0 
2 100 
3 50 
4 7 
Overall 59 


Overall c assification 


Group % 
t 10 
2 50 
3 33 
4 93 
Overall 65 
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Table 9 - Biological interpretations of morphometric data analyses relative to the rwerine hypothesis Distances 
(те, amount of differentiation) based on centroids for first or first and second canomcal functions S, 
data set supports prediction, R, data set rejects prediction; +, equivocal 


Predictions 

Data set Group 1&3 distance < 1&2 distance | Group 2&4 distance < 3&4 distance 
d Wilks’ criterion, CG 5 Ы 
Wilks’ criterion, WRH R R 
C Mahalanobis criterion, CG R + 
© Mahalanobis criterion, WRH R * 
9 Wilks’ cnterion, CG 8 R 
9 Wilks’ cntenon, WRH R R 
Mahalanobis critenon, CG B R 
Mahalanobis criterion, WRH R R 


criterion and Wilks criterion models try the same variable first (thigh). In all other cases, 
different variables were tried first under the two model criteria 


As the statistical results for these data (tab. 7-8) are quite different, it is no surprise that 
their biological interpretations also differ There are differences of whether the results support 
or reject the predicted pairwise differences among the four groups (tab. 9), not only due to 
measurement differences, but to model criteria as well. It should be noted that this data set is 
not as large as one would like to have strong confidence in the statistical model results. 
However, for demonstrating inter-observer and inter model effects, it 1s adequate 


INTRA-OBSERVER DATA 


The impact of individual measurement error in making biological interpretations of the 
data is difficult to assess m general, but can be done within the context of specific analyses. We 
examine the repeated measurements in the context of the male discriminant analysis of 
geographic variation as an illustration. The 20 remeasurement values were incorporated into 
the analysis but not for the production of the original discriminant function model. The 
results were incorporated only in the final classification stage. The canonical discriminant 
Scores were plotted on the first two canonical axes (fig 7) The results ind.cate that the 
remeasurement variability can compromise the biological interpretation of the results. For 
example. if ihe polygon encompassing the variation exhibited by the 20 remeasurements on 
fig 7 were transferred to the specimen at the top of the polygon encompassing mdividuals 
from Area 2, the mita individual measurement differences would then bridge the gap between 
the polygons encompassing individuals from Areas 2 and 4 lt is likely that without the 
estimate of individual measurement error, the results would be interpreted as Area 2 being 
distinct from a combination of Areas l. 3 and 4. With the estimate of individual measurement 
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Fig 7 — D;senmant function analysts results for male Van-olunus discoduct) lus, by geographic Arcas 
1-4 (see text for definition of areas), WRH measurements, with remeasurement data (R) plotted 
(remeasurement data incorporated only at final classification stage) 


error, the results would be interpreted as all four areas demonstraung modest differentiation 
from each other, with no real distinctions among them. 

We inadvertently found that statistical results vary among versions of the same statistical 
package. Fig 7 was produced from SYSTAT version 7 (Anonymous, 1997), fig. 6 was 
produced from SYSTAT version 9 (ANONYMOLS, 1999), Note the different polygon shapes m 
fig 6 (right) and fig 7, which preferably should be identical (excluding the remeasurement 
polygon in fig. 7). 


CONCLUSIONS AND RECOMMENDATIONS 


Intra and inter-observer differences m recording frog measurement data can lead to 
statistically significant differences m the variables. Because of the soft and flexible nature of 
preserved frogs, measurements cannot be made with great precision, even by the same 
individual. 

Statistical model.ng results of intra- and intet-observer differences m measurements may 
well result in different biological interpretations, as demonstrated m this study. 

The criteria chosen (for example, Mahalanobis or Wilks) for discriminant function 
analys.s can give different results for the same data. which in some cases would lead to 


Source. MNHN, Paris 


176 ALYTES 18 (3-4) 


different biological interpretations. Researchers should be aware that either using the default 
option or the only option availabie in any given discriminant function analysis software 
program package may not be the most appropriate option for their data. 


Bearmg the above in mind, together with other genera! results discussed in this paper, we 
offer the following recommendations: 


(1) Use of eye length as a morphometric variable should be tested for measurement 
precision before being used in a study We recommend against using eye length lacking such 
testing. 


(2) Select the most appropriate statistical model options for the data being analyzed 
Different model options do grve different results. 


(3) Assume measurement differences between sexes in frogs and analyze data separately 
by sex. Combine male, female and juvenile data only after statistical validation that it 15 
appropriate to do so. 


(4) It 1s appropriate to include variables that are smaller (in terms of measurement 
length) with larger variables in multivariate analyses of frog morphometric data. 


(5) Pseudo-precision, while statistically and biologically indefensible, does not have a 
meaningful impact on multivariate analyses of frog morphometric data. While we recom- 
mend avoidance of pseudo-precision, there is no need to discredit studies characterized by 
pseudo-precise data 


(6) Because frog measurements are not precise, but approximate, any biological infer- 
ences drawn from morphometric analyses of frogs must be based only on very robust effect 
size estimates and differences. With the use of even large or moderately large sample sizes only 
the most conservative interpretations of the analyses should be made. 


(7) Do not rush to logarithmic transformation Measured morphological variables can 
serve biology well without transformation Scatter plots, histograms and comparisons with 
the best-fitting normal distribution are tools to determine whether transformation is neces- 
sary or not. 


(8) At least one individual in a frog study should be remeasured a number of times. These 
remeasures should be included in analyses in the manner shown in fig 7 


HAYEK & Buzas (1997) deal with the issue of adequate sample size. They demonstrate 
(1997 66 69) that without any prior knowledge of the distribution or of any population 
characteristics, a sample size of 20 will always be adequate Intuitively, a minimum sample size 
for characterizing 95 ' confidence on measurement error should be less than 20 individuals. 
Havik & Buzas (1997 69-70) discuss how assuming a normal distribution can reduce the 
adequate sample size below 20 However. from a statistical perspective, because repeated 
measurements by a trained observer will result m a relatively extreme variance estimate, the 
alternative sample size estimates discussed by Hayck and Buzas will result in a size of less than 
one individual We therefore are left with the recommendation of a sample size of 20 
remeasurements of a specimen for estimation of intra individual measurement error unless or 
until a measurement etlect size is repeatedly calculated and comes into general use in the frog 
research community 
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